AD-A208  013 


Technical  Document  1510 
April  1989 


DTIC 


Keyword  Feedback  for 
Improving  Speech 
Recognition  in  Command 
and  Control 
Information-Acquisition 
Tasks 


Stephen  W.  Nunn 


Approved  fcr  pubic  relMso;  dtotrtt>utk>n  It  unimitod. 


087 


NAVAL  OCEAN  SYSTEMS  CENTER 

San  Diego,  California  92152-5000 


E.  G.  SCHWEIZER.  CAPT,  USN 
Commandsr 


R.  M.  HILLYER 
Tachnical  Director 


ADMINISTRATIVE  INFORMATION 

Work  was  performed  under  OARPA  fundin^f  Iqr  the  User  Interface  Technology 
Branch,  Code  441.  This  report  summarizes  work  done  from  November  1986  through  April 
1987. 

Released  by 

W.  T.  Rasmussen,  Head 
Command  Support  Technology 
Division 


Under  authority  of 
R.  C.  Kolb,  Head 
Command  and  Control 
Department 


UNCLASSIFIED 

SECURITY  CLASSIFICATION  OF  THIS  PAGE 


REPORT  DOCUMENTATION  PAGE 


la.  REPORT  SECURITY  CLASSIFICATION 
UNCLASSIFIED 


2a.  SECURITY  CLASSIFICATION  AUTHORITY 


2b.  DECLASSriCATION/DOWNGRADING  SCHEDULE 


4.  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 

NOSC  TD  1510 


6a.  NAME  OF  PERFORMING  ORGANIZATION 
Naval  Ocean  Systems  Center _ 


6b.  OFFICE  SYMBOU 
VtfifilicMt) 


6c.  address  (Or.St^mlZrCaf) 


San  Diego,  CA  92152-5000 


ea.  NAME  OF  FUNDING/SPONSORING  ORGANIZATION 
Defense  Advanced  Research  Projects  Agency 


Sb.  OFFICc  SYMBOg 
OhpfleMt) 

DARPA/ISTO 


1b.  RESTRICTIVE  MARKINGS 


3.  DISTRIBUTION /AVAILABILITY  OF  REPORT 


Approved  for  public  release;  distribution  is  uniimited. 


5.  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 


7a.  NAME  OF  MONITORING  ORGANIZATION 


7b.  ADDRESS  lUf.  SHtund 2»>  Cod$) 


e.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 


8c.  ADDRESS  ICif.SmiidSf’Codt) 

10.  SOURCE  OF  FUNDING  NUMBERS  I 

PROGRAM  ELEMENT  NO. 

PROJECT  NO. 

TASK  NO. 

AGENCY 
ACCESSION  NO. 

1400  Wilson  Blvd. 

Arlington,  VA  22209 

ITSOiHOTE 

-eE2-7449A 

11.  TITLE  (n**S«un|iClM<«BaecM) 

KEYWORD  FEEDBACK  FOR  IMPROVING  SPEECH  RECOGNITION  IN  COMMAND  AND  CONTROL 
INFORMATION-ACQUISITION  TASKS 

12.  PERSONAL  ALITHOR(S) 

Stephen  W.  Nunn 

13a.  TYPE  OF  REPORT 

Final 

13b.  TIME  COVERED 

FROM  Nov  1986  TO  Apr  1987 

14.  DATE  OF  REPORT  (Taw,  kkmn  Ow) 

April  1989 

IS.  PAGE  COUNT 

23 

16.  SUPPLEMENTARY  NOTATION 

1  17.  COSATI  COOES 

FIELD 

GROUP 

SUB-GROUP 

18.  SUBJECT  TERMS  {ConUniMciimtmtmetatirmUinlitf  if  block  numter) 

speech  recognition 
keyword  spotting 
information  acquisition 
natural  language 


STRACT  (Contnuaai/MntfoKMaaiyaitfiMiyirMebtnumNr) 


This  report  addresses  a  number  of  important  issues  involving  the  implementation  of  continuous  speech  recognition  for  information- 
acquisition  tasks.  The  following  questions  are  addressed: 

1.  Foi  a  restricted  information  ac^isition  task  such  as  database  query,  what  can  be  considered ^atura^^eech? 

2.  How  can  the  syntax  be  constrained  to  improve  accuracy  but  at  the  same  time  still  allow  natural  speech? 

3.  How  effectively  can  feedback  be  used  to  shape  or  influence  the  user’s  speech?  Can  the  user  adapt  to  a  limited  vocabulary  and 
restricted  syntax? 

4.  For  an  appropriate  task,  can  a  continuous  speech  recognizer  configured  as  a  word  spotter  be  used  to  partially  remove  syntax 
restrictions? 

An  information  acquisition  experiment  is  described.  Results  are  presented  and  user  interaction  anc!^  reaction  are  discussed. 


20.  DISTRBUTION/AVAILABIUTY  OF  ABSTRACT 

Q  UNCLASSnED/UNUMfTEO  0  SAME  AS  RPT  0  OTIC  USERS 


22a.  NAME  OF  RESPONSBLE  PERSON 
Stephen  W.  Nunn 


21 .  ABSTRACT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 


22b.  TELEPHONE  (Mb*  Ant  Co*) 
_J6192J530654___^^__ 


22c.  OFFICE  SYMBOL 
Code  441 


DO  FORM  1473,  84  JAN 


83  APR  EDITION  MAY  BE  USED  UNTIL  EXHAUSTED 
AU  OTHER  EOmONS  ARE  OBSOLETE 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 


UNCLASSIFIED 

SECURITY  CLASSIFICATION  OF  THIS  PAGE  {WlmDUtenM) 


DD  FORM  1473,  84  JAN 


UNCLASSIFIED 

SECURITY  CLASSIFICATION  OF  THIS  PAGE  (WAm  OM  fntml) 


CONTENTS 


Introduction  . 1 

Background . 2 

Query  Language  . 2 

Previous  Research . 3 

Limitations  . 4 

Speech-Recognition  Experiment  . 5 

Method . 5 

Subjects . 5 

Procedure . 5 

Apparatus . 5 

Experimental  Design . 6 

Results . 7 

Vocabulary  tmd  Queiy  Syntax  Structures . 7 

Amount  of  Information  Requested . 8 

Unacceptable  Queries  . 9 

Discussion . 9 

Vocabulary . 9 

Query  Complexity . 9 

Approaches  to  Speech  Recognition  . 10 

Isolated-Word  Recognition  . 10 

Continuous  Speaker-Dependent  Speech  Recognition  . 10 

Keyword  Spotting . 10 

Conclusions . 11 

Recommendations . 12 

Bibliography  . 12 

Appendix:  Status  Boards  Showing  the  Three  Scenarios . 15 


iii 


Accesiof)  For  |  | 

NTIS  CRA&I 
OTIC  TAB 

Uoduno'.i'ict'd 

Justificotii'i 

L) 

U 

By 

Dii-  ti  :b 

y  Codes 

Dist 

A\Mii  . 

:  ;■  or 

cull 

INTRODUCTION 


~1Chir  objective  in  the  Speech  Technology  Group  of  the  User  Interface  Technol¬ 
ogy  Branch  at  the  Naval  Ocean  Systems  Center  (NOSC)  in  Sem  Diego  is  to  apply 
speech  technology  to  naval  command  and  control  ^sterns.  By  employing  data  bases 
£md  expert  systems  to  aid  in  decision  making,  naval  command  and  control  systems 
Eire  becoming  more  complex  and  sophisticated.  The  design  of  msm-machine  interfaces 
to  these  systems  is  criticsd  to  user  acceptEmce  and  functionEdity.  We  believe  that  the 
use  of  speech  recognition  in  many  of  these  interns  is  not  only  possible,  but  vital  to 
shipboEU-d  operations. 

Naval  commEmd  Euid  control  operations  present  a  number  of  technological 
problems  for  speech-recognition  ^tems.  Such  problems  include  high  or  variable 
background  noise  environments,  moderate  stress  situations,  and  the  need  for  high 
accuracy.  It  is  therefore  essentiEd  in  navEd  applications  to  develop  speech-recognition 
interfaces  that  do  not  require  extensive  training,  Eu*e  relatively  easy  to  operate,  and 
are  robust. 

NavEd  personnel  are  required  to  chEinge  stations  and  perform  different  func¬ 
tions  frequently.  Their  skill  level  Emd  fEuniliarity  with  systems  can  vary  greatly.  Each 
sjrstem  hEis  its  own  operating  commEinds  and  chEU'acteristics  that  must  be  known 
before  it  CEtn  be  operated.  Speech  recognition  could  provide  a  greatly  needed  user- 
friendly  interface  to  these  ^tems  that  would  only  require  the  user  to  know  what 
kinds  of  functions  the  system  is  capable  of  performing.  Such  an  interface  would 
greatly  reduce  the  amount  of  training  the  user  would  need  to  communicate  with  the 
system. 


In  summEuy,  we  Eire  driven  by  the  following  requirements: 

1.  Relatively  eEisy  system  training  suid  use. 

2.  Robustness  to  environmental  background  noise  and  moderate  stress 
situations. 

3.  Hig^  accuracy. 

4.  Real-time  response. 

5.  A  user-friendly  interface  that  requires  minimal  knowledge  of  internEd 
system  operation.  IdeEdly  the  user  should  only  need  to  know  what  kind  of 
hmetions  the  ^tem  is  capable  of  performing  in  system  control  operations 
or  what  kind  of  information  is  available  for  information  acquisition  tasks. 

One  of  the  most  critical  command  and  control  user  interfaces  is  the  access  of 
database  and  expert  ^stem  information.  Much  time  Euid  effort  have  been  spent  on  the 
development  of  extensive  real-time  databases.  We  feel  that  these  ^sterns  may  be 
little-used  or  even  rejected  if  the  user  is  required  to  leEirn  a  speciEd  lEinguage  for 
communicating  with  each  system. 

The  focus  of  this  paper  is  two-fold:  First,  the  application  of  speech  recognition 
in  limited  command  and  control  tasks  requiring  the  access  of  database  information, 
Eind  second,  Em  approach  to  the  speech-recognition  problem  based  on  keyword- 


1 


spotting  concepts  that  can  potentially  meet  speech-recognition  requirements  in 
command  and  control  operations. 

BACKGROUND 


QUERY  LANGUAGE 

In  considering  the  use  of  speech  recognition  for  the  access  of  a  database,  it  is 
of  some  benefit  to  examine  issues  which  have  Eu*isen  with  traditional  database  access 
using  a  keyboard. 

There  has  been  considerable  debate  over  the  issue  of  which  query  language  is 
best— a  formal  queiy  language  having  a  very  constrained  syntax  and  vocabulary,  or  a 
natural  query  language  that  is  relatively  unconstrained. 

Ogden  and  Brooks  (1983)  provide  a  thorou^  comparison  of  formula  and  natu¬ 
ral  language.  Formal  query  languages  have  the  advantage  of  having  a  constrained 
language  which  teaches  a  concise  and  unambiguous  way  of  communicating  with  the 
computer.  However,  there  are  a  number  of  disadvantages  in  using  formal  query  lan¬ 
guages.  These  languages  require  the  user  to  have  an  explicit  model  of  the  database, 
i.e.,  all  database  attributes  and  their  relationships  must  be  known.  Extensive  training 
is  required  to  learn  this  model  and  the  constraints  of  the  language.  Even  after  train¬ 
ing,  errors  are  often  made  in  spelling,  ^tax,  and  punctuation.  Formal  language 
queries  can  also  be  overly  verbose  and  complicated.  In  many  cases,  there  is  usually  a 
more  concise  way  of  asking  a  question  with  a  natural  language. 

There  are  a  number  of  advantages  to  using  natural  query  languages  (Ogden 
and  Brooks).  More  people  would  be  able  to  access  database  information  if  they  could 
use  their  own  natural  language.  Natural  language  also  eliminates  the  need  to  remem¬ 
ber  a  great  deal  of  notation  that  is  irrelevant  to  the  problem  and  detracts  from  the 
user’s  ability  to  concentrate  on  the  problem  (Ehrenreich,  1981).  In  addition,  users 
need  only  describe  the  data  they  wsmt  retrieved  and  do  not  have  to  be  concerned 
about  how  that  data  is  retrieved. 

However,  natural-language  interaction  is  not  always  the  optimal  solution. 
Natimal  queiy  languages  have  a  number  of  disadvantages.  Training  is  still  required 
for  users  to  learn  the  constraints  of  the  language,  and  even  with  training,  users  often 
do  not  understand  hidden  constraints  and  sometimes  ask  illegal  queries.  The  inherent 
ambiguity  of  the  English  language  can  also  cause  problems. 

Ihe  arguments  for  and  against  the  use  of  natural  language  with  speech  recog¬ 
nition  are  similar  to  those  for  and  against  the  use  of  a  keyboard.  However,  the  issue 
of  understanding  the  constraints  of  the  ^tem  becomes  even  more  critical  because 
the  user  must  deal  not  only  with  the  constraints  of  the  speech-recognition  ^tem  but 
also  with  the  constraints  of  the  natural-language  interpreter.  This  is  particularly  a 
problem  with  speech.  Owing  to  the  “naturalness”  of  speech,  users  are  inclined  to 
speak  freely,  forgetting  any  artificial  constraints. 


2 


With  keyboard  input,  users  expect  to  be  required  to  learn  special  commands 
and  constraints  to  communicate.  But  users  are  often  unwilling  to  learn  any  artificial 
constraints  with  speech. 


PREVIOUS  RESEARCH 

An  important  concern  is  the  degree  to  which  syntax  and  vocabulary  can  be  re¬ 
stricted,  but  still  be  “habitable.”*  To  try  to  gain  an  understanding  of  the  problem,  we 
needed  to  investigate  how  naval  officers  would  naturally  query  a  database.  In  prelimi- 
naiy  research,  Bemis  (1986)  gained  valuable  information  concerning  the  variability  in 
vocabulary  and  syntax  in  questioning  styles.  Bemis  found  that  navsd  officers  will 
often  speak  tersely  and  use  common  syntactic  constructions.  This  result  was  not  un¬ 
expected,  as  naval  officers  are  taught  to  speak  concisely  and  use  common  terminology 
in  all  communication. 

Data  gathered  in  this  initial  experiment  aided  in  assessing  the  requirements 
for  a  speech  interface  to  a  database.  Working  with  ITT  Defense  CJommunications  Di¬ 
vision  (ITTDCD),  we  developed  a  speech-recognition-natural  language  interface  for  a 
navsd  battle  msmagement  task.  In  this  application,  natural  language  was  used  to  ac¬ 
cess  information  in  a  naval  taskforce  database.  Figure  1  shows  the  basic  system 
configuration. 

Speech 

1 

ITT 

Continuous 
Speech 
Recognizer 

Figure  1.  Speech-recognition— natural-language  interface. 


Natural-Language 

Processor 

(NLP) 


Database 

Management 

S3rstem 

(DBMS) 


The  ITT  recognizer  is  a  continuous,  speaker-dependent,  real-time  ^tem.  Syn¬ 
tax  is  controlled  with  the  use  of  a  finite  state  grammar.  The  ITT  recognizer  provides 
hi^  recognition  accuracy  and  rapid  training  and  is  very  robust  even  under  ffigh 
noise  conditions. 

The  natural-language  processor  (NLP),  developed  by  Janet  Haake  of  ITTDCD, 
works  as  a  keyword  processor,  processing  amd  interpreting  only  certain  words  of  a 
query.  When  a  question  is  recognized,  it  is  passed  to  the  natural-language  processor, 
which  translates  the  query  into  the  database  query  language  commands.  These  com¬ 
mands  are  then  sent  to  the  Database  Management  System  (DBMS),  in  this  case 
DBASE  III,  which  retrieves  the  information  from  the  database. 

The  parsing  procedure  is  described  by  Haake  et  al.  (1987).  The  NLP  parses  the 
query  one  word  at  a  time  from  left  to  ri^t.  The  parsed  word  is  looked  up  in  a 

*  A  habitable  language  is  one  in  which  the  user  can  speak  freely  within  a  given  task, 
but  still  remain  witliin  the  limits  set  by  the  ^tax  and  vocabulary. 


3 


dictionaiy  to  determine  if  it  is  a  keyword.  If  it  is  found,  then  a  particular  action  is 
triggered,  based  on  information  contained  in  the  dictionary  for  that  particular  word. 
If  it  is  not  in  the  dictionary,  i.e.,  if  it  is  not  a  keyword,  the  word  will  be  ignored,  and 
the  parser  will  go  on  to  the  next  word  in  the  query. 

This  keyword-processing  approach  to  query  interpretation  has  many  advan¬ 
tages  over  a  strict  syntax-driven  interpreter.  The  dictionary  does  not  need  to  include 
all  possible  words  that  might  occur  in  a  query— only  the  keywords  that  carry  seman¬ 
tic  information.  More  important,  the  user  has  greater  flexibility  in  structuring 
queries.  The  queries  can  be  phrased  in  whatever  form  is  m  st  natural.  In  addition, 
^e  query  need  not  even  be  grammatically  correct.  All  that  is  necessary  is  for  the  key 
information-carrying  words  to  be  in  the  query. 


LIMITATIONS 

This  keyword  approach  is  very  powerful  and  robust  because  (a)  the  task  is  lim¬ 
ited  and  concise  and  there  is  little  implied  information  being  carried  in  the  queries; 

(b)  the  queries  are  concerned  only  with  obtaining  basic  information  from  the  data¬ 
base;  and  (c)  no  reasoning  or  deduction  is  required  to  interpret  the  queries  or  retrieve 
the  desired  information  from  the  database  (Haake  et  al.). 

The  greatest  limitation  of  this  ^tem  is  the  speech  recognizer’s  internal  syn¬ 
tax.  The  use  of  keyword  spotting  by  the  NLP  is  not  as  beneficial  as  it  could  be 
hecau.se  of  the  tight  syntax  restrictions  of  the  recognizer.  A  reasonable  approach 
would  be  to  configure  the  recognizer  in  a  word-spotting  mode,  thereby  eliminating 
many  of  the  syntax  restrictions  imposed  by  the  finite-state  grammar  of  the  recog¬ 
nizer. 


There  are  a  number  of  questions  concerning  the  feasibility  of  keyword  spot¬ 
ting  for  a  real  application.  The  primaiy  concern  is  whether  the  performance  of  the 
recognizer  configured  in  the  word-spotting  mode  will  be  adequate  for  natural- 
language  queries.  In  addition,  complex  questions  can  create  eunbiguity  for  a  keyword- 
spotting  ^tem.  It  is  also  unknown  how  extensive  a  task  the  keyword-spotting 
approach  will  be  capable  of  handling. 

The  initial  limited  experiment  (Bemis)  generated  these  and  many  more  ques¬ 
tions.  Another  experiment  was  needed  that  would  gather  more  realistic  data.  How¬ 
ever,  the  complexity  of  the  problem  had  to  be  limited  enou^  to  control  the  vocabu¬ 
lary  used  and  types  of  questions  that  would  be  asked.  An  information-acquisition 
experiment  was  design^  to  study  naval  officers’  natural  spontaneous  forms  of 
questioning  and  to  determine  an  approach  to  speech  recognition  that  can  best  take 
advantage  of  a  keyword  natural-language  processor.  The  rest  of  this  report  is  devoted 
to  that  experiment. 


4 


SPEECH-RECOGNITION  EXPERIMENT 


METHOD 

Subjects 

The  subjects  for  this  study  were  nine  naval  officers  currently  stationed  at 
NOSC.  They  had  a  variety  of  duty  experiences  in  the  Navy,  but  none  had  used  speech 
recognition  equipment  before. 

Procedure 

The  experimental  sessions  were  conducted  with  each  subject  individually.  The 
subjects  were  told  that  they  were  sissisting  us  in  the  evaluation  of  speech  recognition 
for  use  in  naval  command  and  control  operations.  They  were  also  told  that  they 
would  be  speaking  to  a  speech  recognizer  during  the  experiment.  In  reality,  all  ques¬ 
tions  asked  by  the  subjects  were  monitored  in  an  a4jacent  room  by  the  experimenters. 
The  subjects’  questions  and  their  answers  were  entered  by  the  experimenters  on  their 
terminal  and  were  immediately  sent  to  the  subjects’  terminfd. 

'The  subjects  were  presented  a  naval  scenario.  They  were  asked  a  question  con¬ 
cerning  the  scenario  that  typically  could  have  been  asked  by  their  commanding 
officer.  On  the  terminal  in  front  of  them,  a  blank  status  board  was  shown  that  repre¬ 
sented  the  information  available  to  them  from  a  database.  ’The  subjects’  task  was  to 
task  questions  designed  to  obtain  enough  information  to  answer  the  question  posed  to 
them. 


No  restriction  was  placed  on  the  vocabulary  or  the  syntax  subjects  could  use, 
except  that  they  were  told  to  "Ask  questions  to  acquire  information  from  the  status 
board  that  will  be  displayed.”  There  was  no  time  limit  imposed  on  each  session. 

Apparatus 

The  experiment  was  run  on  a  Msisscomp  5600  minicomputer.  The  task  and 
data-collection  programs  were  written  in  the  C  programming  language  by  the  author 
of  this  report.  The  subjects  sat  in  one  room  and  the  experimenters  in  an  adjacent 
room.  The  scenario  was  presented  to  the  subjects  on  a  WYSE  50  terminal,  the  same 
type  of  terminal  used  by  the  experimenters  for  controlling  the  experiment.  A 
Panasonic  WV-3400  video  camera  and  an  NV-8420  video  tape  recorder  were  used  to 
videotape  the  subjects.  The  experimenters  viewed  the  subjects  during  the  experiment 
on  a  Sony  19-inch  PVM-1900  color  monitor.  A  Shure  SM-10  headset  microphone  was 
used  to  record  the  subjects’  speech.  Figure  2  illustrates  the  test  setup. 


5 


Wall 

ngtire  2.  Experimental  test  setup. 


EXPERIMENTAL  DESIGN 

There  were  two  experimental  variables  in  the  experiment.  In  the  first,  there 
was  complete  feedback  of  subject  queries,  while  in  the  second,  feedback  was  restricted 
to  only  the  keywords  in  the  query.  These  variables  were  chosen  to  determine  the  de¬ 
gree  to  which  the  subjects’  vocabulary  and  ^tax  can  be  shaped  by  feedback.  It  was 
critical  to  ensure  that  the  subjects’  queries  would  be  as  natural  and  spontaneous  as 
possible.  The  approach  taken  was  to  elicit  questions  from  the  subjects  by  creating 
three  simple  scenarios  and  proceeding  in  the  following  manner: 

•  For  each  scenario,  information  describing  the  capabilities  of  a  particular 
ship  of  interest  was  given  as  part  of  the  subjects’  instructions.  ’The  three 
scenarios  involved  the  USS  Texas,  the  USS  Brewton,  and  the  USS 
Callahan,  respectively. 

•  The  ship  had  some  kind  of  problem  that  must  be  repaired. 

•  A  question  was  posed  to  the  subject  concerning  the  problem. 

•  To  answer  the  question,  the  subject  was  required  to  ask  questions  to 
retrieve  information  to  *flll  in”  a  blank  status  board  displayed  on  his 
screen.  Initially,  the  status  board  contained  only  information  categories 
shown  in  columns  across  the  top  and  available  ships  listed  vertically  along 
the  left  side  of  the  display.  (The  status  boards,  along  with  each  of  the  three 


6 


questions  posed  to  the  subjects,  are  presented  in  the  appendix.)  The  goal  of 
this  approach  of  data  presentation  to  the  subjects  weis  to  avoid  biasing  the 
syntax  of  the  questions.  Although  this  approach  would  tend  to  bias  the 
subjects’  vocabuleuy,  it  would  be  biased  in  a  natural  direction  for  users  of  a 
database,  i.e.,  users  must  have  some  idea  of  the  categories  of  information 
in  a  database  to  use  it  at  all. 

•  When  the  subject  retrieved  enough  information,  he  stated  his  solution.  The 
exact  solution  was  inconsequential.  The  primary  concern  was  to  analyze 
questions  framed  by  the  subject. 

•  After  a  solution  was  stated,  the  screen  cleared,  and  the  next  scenario  was 
displayed. 

Four  dependent  measures  were  used  to  assess  subject  performance: 

1.  Vocabulary.  The  number  of  keywords  divided  by  nonke3rwords  was  calcu¬ 
lated  for  each  query. 

2.  Length  of  Queries.  Total  words  per  query  was  calculated  and  averaged  for 
each  of  the  three  questions. 

3.  Syntactic  Structures  and  Query  Complexity.  Syntactic  structures  were  di¬ 
vided  into  msyor  categories.  The  quantity  of  information  asked  for  was  also  examined. 

4.  Unacceptable  Queries.  The  number  of  unacceptable  queries  was  totaled  for 
each  question  and  for  the  entire  session. 

RESULTS 

The  preliminary  results  of  the  experiment  showed  that  subjects  did  have  their 
responses  shaped  by  restricting  the  feedback  to  keywords.  An  analysis  of  variance  was 
performed  on  the  data.  The  number  of  keywords  used  compared  to  nonkeywords  as 
expressed  in  a  keyword-nonkQrword  ratio  was  significantly  hi^er  for  the  restricted 
group.  In  this  analysis,  the  larger  the  keyword-nonkeyword  ratio,  the  more  terse  the 
queiy.  The  subjects  in  the  restricted  group  also  used  significantly  fewer  words  overall 
to  make  queries. 

Vocabulary  and  Query  Syntax  Structures 

The  subjects’  vocabulary  was  broken  up  into  the  following  categories: 

C  =  Command  or  query  (classified  as  nonk^rwords) 

Q  =  Keyword  qualifier 
D  =  Keyword  data  from  the  status  board 

S  =  Keyword  ship  names  or  the  word  “ships” 

All  other  words  were  considered  nonkeywords  and  were  ignored  for  this 
analysis. 


7 


Following  is  a  list  of  the  five  most  common  ^tactic  types: 


Query  Type 

1.  CSQ 

2.  C  S  Q  D 

3.  CQS 

4.  C  Q  S  Q 


Example 

C  S  Q 

What  ships  are  in  CENTPAC? 

C  S  Q  D 

What  ships  in  the  battlegroup  have  helos? 

C  Q  S 

What’s  the  location  of  Wichita? 

CQS  Q 

How  far  is  Kiska  from  Callahan? 

C  Q  D  S 


5.  C  Q  D  S  Is  there  a  CASREP  on  the  SLQ-32  on  Wichita? 

An  analysis  of  syntactic  types  versus  feedback  condition  revealed  that  re¬ 
stricted  feedback  apparently  tends  to  limit  tyntax  to  the  two  simplest  query  forms. 
The  following  table  summarizes  the  results: 


Query 

%  Used  by 

%  Used  by 

Type 

Restricted  Group 

Complete  Group 

1 

38 

1.5 

2 

0 

7 

3 

57 

65 

4 

1 

14.5 

5 

0 

5 

All  others 

4 

7 

No  definite  conclusions  can  be  drawn  from  the  complete  feedback  data.  How¬ 
ever,  it  is  possible  that  the  large  percentage  of  type-3  questions  by  both  the  restricted 
feedback  and  complete  feedback  groups  can  be  attributed  to  the  influence  of  their 
shared  naval  background.  In  the  Navy,  certain  terms  and  phrases  are  standardized. 


Amount  of  Information  Requested 

The  complexity  range  of  the  questions  subjects  could  ask  was  analyzed.  It  was 
decided  that  queries  asking  for  information  for  multiple  ships  from  a  single  category 
would  be  allowed,  but  that  questions  across  different  categories  would  not  be  allowed. 
This  restriction  is  mainly  due  to  the  current  capability  of  the  natural-language  proc¬ 
essor. 


Results  indicated  that  the  feedback  condition  had  little  or  no  effect  on  the 
am'^v.nt  of  information  requested.  However,  a  definite  trend  developed  in  both  condi¬ 
tions  as  subjects  progressed  from  the  first  to  the  third  scenario.  In  the  first  scenario, 
the  subjects’  questions  were  predominantly  requesting  only  single  pieces  of  informa¬ 
tion.  By  the  tUrd  scenario,  the  mfyority  of  questions  were  a.sking  for  two  or  more 
pieces  of  information. 


8 


Unacceptable  Queries 

Unacceptable  queries  were  genersdly  of  two  types:  (1)  a  correction  made  dur¬ 
ing  or  immediately  following  the  question;  (2)  the  question  asked  for  multiple  data 
items  across  categories.  Only  6%  of  the  total  queries  were  considered  unacceptable. 
No  significant  relationship  was  found  between  speech  errors  and  feedback  condition. 

DISCUSSION 

The  primeuy  goals  of  this  study  were: 

1.  To  gain  an  understanding  of  spontaneous  questioning  styles  used  by  sub¬ 
jects  to  acquire  information  for  a  problem-solving  task. 

2.  To  determine  if  the  type  of  feedback  presented  affects  the  vocabulary  and 
syntax  of  subjects’  questions. 

3.  To  determine  if  a  minimal  syntactic,  keyword  approach  to  speech  recogni¬ 
tion  and  interpretation  can  be  used  for  natural-language  access  to  a  database. 


VOCABULARY 

The  restricted-feedback  condition  clearly  shaped  subjects’  queries  toward  the 
use  of  keywords.  Subjects  commented  that  when  they  realized  that  all  the  informa¬ 
tion  that  was  needed  by  the  speech-recognition  system  was  the  keywords,  they  began 
using  the  keywords  predominamtly.  Subjects  felt  that  the  use  of  keywords  was  the 
quickest  suid  easiest  way  to  retrieve  information. 

For  information-acquisition  tasks  such  as  database  retrieval,  the  effect  of  the 
constraints  on  natured  leuigueige  can  be  minimized  if  the  organization  of  the  database 
corresponds  to  an  organization  the  user  perceives  as  natured  (Ehrenreich).  Given  a 
"natural”  organization  to  a  database,  subjects  had  little  trouble  accessing  information 
using  the  retrieval  keys  or  their  synonyms.  For  example,  the  category  “DISTANCE 
TO  <  SHIPNAME  >  ”  was  listed  as  information  directly  stored  in  the  database.  This 
is  a  more  natural  category  of  information  than  simply  storing  ship-location 
information,  which  would  require  the  user  to  calculate  distances. 


QUERY  COMPLEXITY 

Previous  research  indicates  that  people  will  tend  to  home  in  on  desired  infor¬ 
mation.  Instead  of  asking  for  a  single  complex  piece  of  data,  people  will  request  one 
or  more  simple  data  sets.  This  allows  people  to  judge  the  important  set  relations  of 
the  data  (Ehrenreich).  Our  results  support  this  tendency.  Subjects  tended  to  break  up 
the  problem  into  small  subproblems  that  could  be  solved  by  retrieving  one  piece  of 
information  or  a  related  group  of  data.  This  approach  to  information  acquisition  also 
resulted  in  subjects  asking  fairly  ^tactically  simple  questions.  In  the  restricted- 
feedback  condition,  95%  of  the  queries  were  one  of  the  two  simplest  ^tax  forms. 
This  supports  the  use  of  a  word-spotting  technique  for  recognition  and  interpretation. 


9 


When  subjects  found  that  the  ^tem  could  not  handle  a  particular  question, 
they  easily  rephrased  it  to  retrieve  the  information  they  needed. 


APPROACHES  TO  SPEECH  RECOGNITION 

The  results  of  the  study  indicate  that  the  use  of  speech  recognition  for 
natural-language  access  to  a  database  is  quite  promising.  The  next  question  is,  which 
approach  to  speech  recognition  would  be  the  most  effective?  Current  speech- 
recognition  technology  provides  several  choices  for  a  speech-recognition-natural- 
language  interface. 

Isolated- Word  Recognition 

There  are  a  number  of  applications  for  isolated-word  recognizers  in  command 
and  control  operations.  Several  1000-word  discrete  recognizers  are  available  that 
work  quite  well.  However,  for  applicatioi^  such  as  database  query,  the  requirement 
of  speaking  with  distinct  pauses  between  words  is  unacceptable.  Isolated-word  recog¬ 
nition  works  well  for  applications  where  only  sin^e  commands  are  given,  but  when 
you  allow  people  to  spesdE  using  natural  syntax  and  vocabulary,  it  is  only  natured  for 
them  to  want  to  speak  continuously. 

Continuous  Speaker-Dependent  Speech  Recognition 

Continuous  speaker-dependent  speech-recognition  technology  has  been  avail¬ 
able  for  the  past  few  years.  In  general,  acceptable  performance  is  achieved  only  with 
very  tight  syntax  restrictions  and  small  branching  factors.  Continuous  recognition 
systems  can  be  configured  without  a  syntax  merely  by  having  all  words  eligible  at  all 
times.  The  advantage  of  this  approach  is  that  the  users  do  not  need  to  remember  how 
they  must  phrase  questions.  Unfortunately,  natural  language  is  made  up  of  many 
short  function  words  that  can  be  veiy  diSlcult  to  recognize  when  competing  against 
each  other.  With  current  technology,  even  with  a  smedl-to-moderate-size  vocabulary, 
the  recognition  of  natural  language  without  a  syntax  or  languEige  model  cannot  be 
done  with  acceptable  accuracy. 

The  advantage  of  using  a  complete  ^tax  that  deHnes  all  allowable  sentences 
is  that  moderate-size  vocabularies  with  a  large  number  of  syntax  nodes  can  be  imple¬ 
mented  while  maintaining  an  acceptable  performance  level.  The  disadvantage  is  that 
people  cannot  remember  speciflc  restrictions  on  how  sentences  must  be  phrased.  Even 
though  the  intended  meaning  of  the  question  is  the  same  and  the  words  used  may  be 
identical,  if  the  question  does  not  follow  the  syntax,  the  entire  question  will  be  re¬ 
jected.  This  would  quickly  lead  to  rejection  of  the  ^tem. 

Ke3n)vord  Spotting 

The  concept  of  keyword  spotting  relies  on  the  idea  that,  for  certain  applica¬ 
tions,  only  keywords  will  be  necessary  to  commimicate  the  intended  meaning  of  a 
sentence.  This  use  of  k^word  spotting  is  quite  different  from  the  conventional  use  of 
word  spotting,  where  the  goal  is  to  survey  large  amounts  of  information  taken  from 
noisy  radio  links  having  narrow  bandwidths  and  to  select  conversations  about  topics 


10 


of  special  interest.  These  conditions  make  the  task  of  word  spotting  difficult,  because 
the  speaker  is  generally  unknown,  the  channel  distorts  and  adds  noise,  the  conversa¬ 
tion  vocabulaiy  and  syntax  are  unlimited,  and  the  speech  can  be  sloppily  spoken  (Lea, 
1981).  In  these  conventional  applications,  the  approach  to  word  spotting  is  to  choose 
appropriate  word  and  subword  vocabularies  to  compete  with  the  keywords  to  be 
recognized. 

Research  in  word-spotting  techniques  indicates  that  the  performance  of  a 
word  spotter  is  greatly  influenced  by  the  number  of  filler  or  nonkeyword  templates. 

In  particular,  the  number  of  filler  templates  and  their  duration  are  the  critical  pa¬ 
rameters.  As  the  number  of  fillers  is  increased  or  their  duration  is  decreased,  the 
probability  of  false  alarm  decreases,  but  so  does  the  probability  of  detecting  the  cor¬ 
rect  k^rword.  The  effect  of  acljusting  these  parameters  is  analagous  to  varying  the 
rejection  threshold  of  a  recognizer  (Higgins  and  Wohlford,  1985). 

The  approach  of  keyword  spotting  for  database  query  is  substantially  different 
than  for  conventional  word-spotting  applications.  There  are  a  number  of  aspects  of 
the  database  query  application  that  would  greatly  increase  the  performance  of  the 
keyword-spotting  technique.  The  ^tem  would  be  speaker-dependent,  and  most  of  the 
nonkeyword  vocabulary  would  be  known.  Results  of  the  experiment  indicate  that,  in 
general,  users  will  ask  fairly  syntactically  simple  questions.  The  vocabulary  used  will 
be  restricted  by  the  users’  l^owledge  of  the  fimctions  the  ^tem  can  perform  (in  con¬ 
trol  operations)  and  the  kind  of  information  available  (for  database  access).  At  the 
acoustic  level,  information-carrying  keywords  are  prominently  stressed  and  clearly 
articulated.  This  tends  to  minimize  many  of  the  problems  associated  with  continuous 
speech  (such  as  coarticulation  and  missing  segments  of  speech). 

However,  there  are  potentially  serious  drawbacks  to  using  keyword  spotting. 
The  context  of  the  words  is  not  known  and  caimot  be  used  to  reinforce  or  verify  deci¬ 
sions.  Syntactically  complex  questions,  in  general,  cannot  be  handled.  The  key  to 
minimizing  these  problems  is  that  keyword  spotting  relies  on  another  piece  of  infor¬ 
mation-semantic  knowledge,  or  intended  meaning.  The  natural-language  processor 
does  not  have  to  know  the  interpretation  of  a  word  in  all  possible  contexts,  because 
the  user  will  have  basic  knowledge  of  what  information  is  available  before  using  the 
system.  It  is  not  unreasonable  to  require  the  person  using  the  database  to  know  what 
kind  of  information  is  in  the  database. 


CONCLUSIONS 

Feedback  appears  to  be  a  powerful  technique  for  controlling  the  vocabulary 
and,  to  some  extent,  syntax  of  queries.  It  was  originally  thou^t  that  the  effect  of 
using  only  keywords  for  feedback  would  occur  slowly  over  time.  However,  the  effect 
can  be  seen  almost  immediately  in  subjects’  queries. 

Given  a  task  whose  *solution  set”  is  limited  to  data  that  can  be  represented  in 
a  tabular  format  corresponding  to  what  users  conceive  of  as  natural  for  the  task, 
users  will  ask  ^tactically  simple  queries  gathering  small  pieces  of  data  at  a  time. 

Restrictions  imposed  by  a  natural-language  interpreter  on  query  complexity  is 
not  offensive  to  users.  Syntactically  simple  queries  seem  to  be  preferred.  As  long  as 


11 


users  feel  no  restrictions  on  how  to  ask  for  information,  the  complexity  of  queries  is 
not  an  issue. 

Categories  of  information  presented  as  headings  to  subjects  produce  an  over¬ 
whelming  tendency  to  use  these  keyword  categories.  However,  subjects  will  also  use 
synonymous  terms,  depending  on  their  approach  to  solving  the  problem. 

A  number  of  subjects  found  keywords  to  be  preferred  over  complete  natural- 
language  sentences.  Their  interaction  was  very  much  goal  driven;  therefore,  only  the 
vocabulary  necessary  to  communicate  the  desired  information  was  used. 

RECOMMENDATIONS 

It  became  apparent  that  more  problem-solving  tasks  were  needed  to  allow 
subjects  to  test  the  capability  of  the  system.  With  more  problems,  it  may  be  that 
people  will  ask  for  more  information  per  query.  If  each  problem  requires  much  of  the 
same  information,  people  will  get  firustrat^  with  asking  for  only  one  piece  of 
information. 

The  table  data-presentation  format  may  have  constrained  the  complexity  of 
questions  people  would  ask.  Available  information  should  be  presented  as  a  list  of 
database  categories.  In  addition,  to  be  as  realistic  as  possible,  retrieved  information 
should  be  displayed  in  a  table. 

The  extent  to  which  current  speech-recognition  technology  can  be  used  for 
natural-language  database  access  depends  on  the  number  of  restrictions  imposed  on 
the  user.  Increased  vocabulary  size  may  facilitate  natural  speech  interaction.  How¬ 
ever,  syntactic  restrictions  predominate  the  speech  interaction  long  before  the 
vocabuliiry  reaches  100  words  (Schmandt,  1986). 

In  preliminary  tests  with  the  ITT  recognizer  configured  in  a  word-spotting 
mode,  results  appear  very  promising.  Further  experiments  in  real  and  simulated 
command  and  control  situations  will  be  necessary  to  answer  remaining  questions. 


12 


BIBLIOGRAPHY 


Bemis,  S.V.  1986.  “Analysis  of  verbal  natural  language  input  for  command  and 
control,”  Proceedings  of  Military  Speech  Tech  ’86  (in  press). 

Ehrenreich,  S.L.  1981.  “Query  languages:  design  recommendations  derived  from  the 
human  factors  literature,”  Human  Factors,  23,  709-725. 

Haake,  J.,  Benson,  P.,  and  Koble,  H.  1987.  “Automatic  speech  understanding  for 
naval  battle  management,”  Proceedings  of  the  Third  Annual  Artificial 
Intelligence  and  Advanced  Computer  Technology  Conference  (in  press). 

Higgins,  A.L.,  and  Wohlford,  Robert  E.  1985.  “Keyword  recognition  using  template 
concatenation,”  Proceedings  of  International  Conference  of  Acoustics ,  Speech 
and  Signal  Processing,  1233-1236. 

Lea,  Wayne  A.,  1981.  Trends  in  Speech  Recognition,  En^ewood  Cliffs,  New  Jersey: 
Prentice-Hall,  Inc. 

Ogden,  W.C.  and  Brooks,  S.R.  1983.  “Query  languages  for  the  casual  user:  exploring 
the  middle  ground  between  formal  and  natural  languages,”  CHI  Proceedings, 
161-165. 

Petrick,  S.R.  1976.  “On  natural  language  based  computer  ^tems,”  IBM  Journal  of 
Research  and  Development,  314-325. 

Schmandt,  C.  1986.  “Problems  in  the  design  of  speech  interfaces  using  large 
vocabulary  recognizers,”  Proceedings  of  Speech  Tech  ’86,  157-159. 

Zoltan-Ford,  E.  1984.  “Reducing  variability  in  natural-language  interactions  with 
computers,”  Proceedings  of  the  Human  Factors  Society  -  28th  Annual  Meeting, 
768-772. 


13 


Appendix  A 


STATUS  BOARDS  SHOWING 
THE  THREE  SCENARIOS 


15 


AVAILABLE  SHIPS  IN  CENTPAC 
STATUS  BOARD 


SHIP  CLASS  SPEC.  EQUIP.  CASREPS  DISTANCE  TO  TEXAS  MAX  SPEED 


JOUETT 

HORNE 

WAINWRIGHT 

YORKTOWN 


Q1 :  Which  available  ships  could  replace  Texas? 


QUESTIONS? 


16 


AVAILABLE  SHIPS  IN  CENTPAC 
STATUS  BOARD 


SHIP  CLASS  SPEC.  EQUIP.  CASREPS  DISTANCE  TO  TEXAS  MAX  SPEED 


MEYERKORD 

PATTERSON 

REASONER 

BRONSTEIN 


Q2;  Which  available  ships  can  replace  Brewton  in  the  shortest  amount  of  time? 
QUESTION? 


17 


AVAILABLE  SHIPS  IN  CENTPAC 
STATUS  BOARD 


SHIP  CLASS  SPEC.  ECUIP.  CASREPS  DISTANCE  TO  TEXAS  MAX  SPEED 


BUCHANAN 

CHANDLER 

NICHOLSON 

BERKELEY 


03:  Can  any  ship  replace  Callahan  before  her  SLQ-32  is  replaced? 


QUESTION? 


